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Abstract 

A closed set of a Euclidean space is said to be Chebyshev if every point in the space has one and 
only one closest point in the set. Although the situation is not settled in infinite-dimensional 
Hilbert spaces, in 1932 Bunt showed that in Euclidean spaces a closed set is Chebyshev if and 
only if the set is convex. In this paper, from the more general perspective of Bregman distances, 
we show that if every point in the space has a unique nearest point in a closed set, then the 
set is convex. We provide two approaches: one is by nonsmooth analysis; the other by maximal 
monotone operator theory. Subdiffcrcntiability properties of Bregman nearest distance functions 
are also given. 
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1 Introduction 

Throughout, M. J is the standard Euclidean space with inner product (•, •) and induced norm || • ||, 
and r is the set of proper lower semicontinuous convex functions on Let C be a nonempty 
closed subset of M J . If each x € M J has a unique nearest point in C, the set C is called Chebyshev. 
The famous Chebyshev set problem inquires: "Is a Chebyshev set necessarily convex?". It has 
been studied by many authors, see [TJ [6j [321 HU [TJ [I5j [TU CLZ] an d the references therein. Although 
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answered in the affirmative by Bunt in 1932, we look at the problem from the more general point 
of view of Bregman distances. 



Let 



(1) 



/: R 



]— oo,+oo] be convex and differentiable on U := intdom/ 7^ 0. 



The Bregman distance associated with / is defined by 



(2) 



D: R xt 



J 



[0,+oo] : (x,y) h-> 




Assume that C C U. It is a natural generalization of the Chebyshev problem to ask the following: 

"If every x £ U has — in terms of the Bregman distance — a unique nearest point in 
C, i.e., C is Chebyshev for the Bregman distance, must C be convex?" 

We give two approaches to our affirmative answer: one uses beautiful properties of maximal mono- 
tone operators: Rockafellar's virtual convexity theorem on ranges of maximal monotone operators; 
the other uses generalized subdifferentials from nonsmooth analysis, which allows us to characterize 
Chebyshev sets. We also study subdifferentiabilities of Bregman distance functions associated to 
closed sets. These nonsmooth analysis results are interesting in their own right, since Bregman 
distances have found tremendous applications in Statistics, Engineering, and Optimization; see the 
recent books [U [9] and the references therein. 

The function D does not define a metric, since it is not symmetric and does not satisfy the 
triangle inequality. It is thus remarkable that it is not only possible to derive many results on 
projections and distances similar to the one obtained in finite dimensional Euclidean spaces, but 
also to provide a general framework for best approximations. 

The paper is organized as follows. In Section [21 we state our assumptions on / and provide 
some concrete choices. In Section [31 we characterize left Bregman nearest points and geodesies. 
We show that the Bregman normal is a proximal normal. In Section HJ when / is Legendre and 
1-coercive and C is Chebyshev, we show that the composition of the Bregman nearest-point map 
and V/* is maximal monotone. This allows us to apply Rockafellar's theorem on virtual convexity 
of range of maximal monotone operator to obtain that a Chebyshev set is convex. In Section 
we study subdifferentiability properties of left Bregman distance function. Formulas for the Clarke 
subdifferential, the limiting sub differential and the Dini subdifferential are given. In Section El we 
give a complete characterizations of Chebyshev sets. Our approach generalizes the results given by 
Hiriart-Urruty |17} [T5] from the Euclidean to the Bregman setting. Finally, in Section [7| we show 
that the convexity of Chebyshev sets for right Bregman projections of / can be studied by using 
the left Bregman projections of /*. We give an example showing that even if the right Bregman 
projection is single- valued, the set C need not be convex. 

Notation: In R J , the closed ball centered at x with radius 5 > is denoted by B$(x) and the 
closed unit ball is B = -Bi(O). For a set S, the expressions intS, clS, convS" signify the interior, 
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closure, and convex hull of S respectively. For a set- valued mapping T : R K , we use ranT and 
domT for its range and domain, and T _1 for its set-valued inverse, i.e., x S T~ 1 (y) Oy6 T(x). 
For a function / : R J — ► ]— oo,+oo], dom/ is the domain of /, and /* is its Fenchel conjugate; 
conv/ (clconv/) denotes the convex hull (closed convex hull) of /. For a differentiable function 
/, V fix) and V 2 /(x) denote the gradient vector and the Hessian matrix at x. Our notation is 
standard and follows, e.g., [20| [2Tj . 

2 Standing Assumptions and Examples 

From now on, and until the end of Section [6l our standing assumptions on / and C are: 

Al / 6 r is a convex function of Legendre type, i.e., / is essentially smooth and essentially 
strictly convex in the sense of [201 Section 26]. 

A2 / is 1-coercive, i.e., lim /(x)/||x| = +oo. An equivalent requirement is dom/* = M. J (see 

||a;||— ++oo 

[211 Theorem 11.8(d)]). 
A3 The set C is a nonempty closed subset of U. 

Important instances of functions satisfying the above conditions are: 
Example 2.1 Let x = (xj)i<j<j and y = (yj)i<j<j be two points in R J . 



(i) Energy: If /: x i— > tH|x|| 2 , then U 



D (x,y) = d|z - y 



and V 2 /(x) = Id for every x G M J . Note that f*(x) = i||x|| 2 , dom/* = M J , and V 2 /* = Id. 



(ii) Boltzmann- Shannon entropy: If /: x i— > Xyj=i ^(^i) — x i if ^ > 0, +oo otherwise. Here 
a; > means Xj > for 1 < j < J and similarly for x > 0, and OlnO = 0. Then U = {x € 
M J : x > 0}, and 



D(x,y) 



Yl J j=i x j ^ n ( x j/Vj) - x j + Vjj if x > and y > 0; 



+oo 



otherwise 



is the so-called Kullback-Leibler divergence. Note that 



V 2 /(x) 



/1/xi 










o \ 





l/x 2 




































V o 
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that f*(x) = Y?j=i e Xj with dom/* = R J , and that 



v 2 r(x) 



/e Xl • • • \ 

e X2 ••• 

: '•• 

... e xj ~ 1 

\ ... e Xj ) 



(iii) Fermi-Dirac entropy: If / : x i— > X^/=i x j hiXj + (1 — Xj) ln(l — Xj), then C7 = {x £ : < 
x < 1} and 



D(x,y) 
While 



'^ = i + (1 - ij) ln((l - - yj)), if 1 > x > and 1 > y > 0; 



-oo, 



V 2 /(x) 



/ — 7^ r 

' 



x 2 (l-x 2 ) 





V 



\ 




xAl-xj)/ 



we have f*(x) = X)/=i Ml + ^0 w ^ tn 



v 2 r(x) 



/ (T+^Tp U 

n e ' 2 

(l+e^)^ 





otherwise. 



V0 < x < l,x € IT, 



V 







\ 




U (l+e*J)V 



Vx <E 



(iv) In general, we can let /: x >— > 5^j=i < / , ( x ?) where : R — > ]— oo, +oo] is an Legendre function. 
Then U = (intdom^)" 7 , 

J 

D(x, y) = Y^ <K^) ~ 4>ito) ~ 4>'{yj){xj -yj), V x e R J , y eU. 

3=1 

In particular, one can use 4>(t) = \t\ p /p with p > 1. 



The following result (see [20, Theorem 26.5]) plays an important role in the sequel. 

Fact 2.2 (Rockafellar) A convex function f is of Legendre type if and only if f* is. In this case, 
the gradient mapping 

V/ : U — » intdom/* : x h-> V/(x), 
is a topological isomorphism with inverse mapping (V/) -1 = V/*. 
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3 Bregman Distances and Projection Operators 

We start with 

Definition 3.1 The left Bregman nearest-distance function to C is defined by 

(3) D C : [0,+oo] inf D(x,y), 

and the left Bregman nearest-point map (i.e., the classical Bregman projector) onto C is 

Pq'- U —>-U:yt—> argmin D(x,y) = {x G C : D(x,y) = Dc{y)}- 

x&C 

The right Bregman distance and right Bregman projector onto C are defined analogously and 
denoted by Dq and Pc, respectively. Note that while in [3] the authors consider proximity operators 
associated with convex set C, here our set C need not be convex and we do not assume that D(-,-) 
is jointly convex. 

We shall often need the following identity 

(4) D(c, y) - D(x, y) = /(c) - f(x) - (Vf(y), c-x), 
which is an immediate consequence of the definition. 

Our first result characterizes the left Bregman nearest point. 
Proposition 3.2 Let x G C and y G U. 

(i) Then 

(5) xGPc(y) & D(c,x)>(Vf(y)-Vf(x),c-x) Vc G C. 
If C is convex, then 

(6) xGPc(y) ^ 0> (Vf(y)- Vf(x),c-x) Vc G C. 

(ii) Suppose that x G Pc(y). Then the Bregman projection of 

(7) z x = V/*(AV/(y) + (1 - A)V/(x)) with < A < 1, 
on C is singleton with 

(8) %(z x ) = x. 
If C is convex, ([8|) holds for every A > 0. 
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Proof, (i): By definition, x G Pc(u) if an d only if 

0<D(c,y)-D(x,y) Vc G C; 

equivalently, /(c) — /(x) > (Vf(y),c — x) by 0. Subtracting (Vf(x),c — x) from both sides, we 
obtain 

D(c,x) > (Vf(y)-Vf(x),c-x). 

Hence ([5]) holds. 

The convex counterpart © is well known and follows, e.g., from [2j Proposition 3.16]. 

(ii): Assume that x G *P~c(y) and z A = Vf*(XVf(y) + (1 - A)V/(x)) with < A < 1. Then by 



(9) D(c,x) > (Vf(y)- V/(x),c-x) Vc G C. 
Take c G C. By Fact El V/ o V/* = Id, we have 

(10) ( V /(z A )-V/(x),c-x) 

(11) = (V/ o V/* (AV/(y) + (1 - A)V/(x)) - V/(x), c - x) 

(12) = ((XV f{y) + (1 - A)V/(x)) - V/(x), c - x) 

(13) =X(Vf(y)-Vf(x),c-x). 



If (Vf(y)-Vf(x),c-x)<0, then 

A(V/(y) - V/(x), c - x) < < £>(c, x); 
if (Vf(y) - V/(x), c - x) > 0, then using < A < 1 and ©, 

X(Vf(y) - V/(x), c - x) < (Vf(y) - Vf(x),c - x) < D{c, x). 
In either case, by (jlOp we have 

(Vf(z x )-Vf(x),c-x) <D(c,x) VcGC. 

Hence x G Pc(z\) by ©. We proceed to show that Pc(z\) is a singleton. If A = 0, then z\ = x, 
Pc{x) = {x} by strict convexity of /. It remains to consider the case < A < 1. Let x G Pc(z\). 
Then D(x, Z\) = D(x, z\), which is 

f(x)-f(x)-(Vf(z x ),x-x)=0, 

by ®. Using z A = Vf*(XVf(y) + (1 - A)V/(x)), we have 

/(x) - /(x) - (AV/(y) + (1 - A)V/(x),x - x) = 0, 

so that 

X[f(x) - f(x) - (Vf(y),x - x)] + (1 - X)[f(x) - f(x) - (V/(x), x-x)) = 0, 
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and 

A[/(x) - f(x) - (Vf(y),x- x}] = (1 - X)[f(x) - f(x) - (V/(x),x - x)]. 
This gives, by Q, A[-D(x,y) — -D(x,y)] = (1 — A)D(x,x) and hence 

D(x,y) - D(x,y) = ^—-^D(x,x), 

since 1 > A > 0. If x ^ x, then D(x, x) > by the strict convexity of / so that D(x, y) > D(x, y), 
and this contradicts that x G Pc(y)- Therefore, Pc(z x ) = {x}. 

When C is convex, by Q, x G Pc(y) if and only if 

(14) (V/(y)-V/(x),c-x) <0, VcGC. 
If ZA = V/*(AV/(y) + (1 - A)V/(aO) with A > 0, then 

(15) (V/(z A ) - V/(x), c - x) = (V/ o V/* (AV/(y) + (1 - A)V/(x)) - V/(x), c - x) 

(16) = A(V/(y)-V/(x),c-x) < 0. 

By (|6j), x £ Pc(-2a)- Applying ([8]), we see that x = Pc{z x ). Indeed, select Ai > A. Since 

(17) z x = V/*(AV/(y) + (1 - A)V/(x)) => Vf(z x ) = V/(x) + A(V/(y) - V/(x)), 

(18) z Xl = VnAiV/G/) + (1 - A0V/(x)) V/(z Al ) = V/(x) + Xi(Vf(y) - V/(x)). 
Solve (USD for V/(y) - V/(x) and put into (HZ]) to get 

V/(* A ) = fl - ^ V/(x) + ±Vf(z Xl ). 



This gives 

zx = V/*((l - A/A0V/(x) + A/A x V/(z Al )). 

As x G Pc(^Ai), © applies. ■ 

It is interesting to point out a connection to the proximal normal cone N^(x) of C at x G C 
Recall that 

(x) := {i(y - x) : t > 0, x G P C (y), 2/ G K J }, 

in which Pc denotes the usual projection on C in terms of Euclidean norm, and each vector t(y — x) 
is called a proximal normal to C at x; see, e.g., [11| Section 1.1] for further information. 

Proposition 3.3 Suppose that f is twice continuously differentiable on U , let y G U, and suppose 
that x G %{y). Then V/(y) - V/(x) G Ng(x). 
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Proof. By Proposition r3.2f i). 

(19) D(c,x)>(Vf(y)-Vf(x),c-x) Vc G C. 
Since the Hessian of / is continuous, using Taylor's formula, we obtain 

(20) D(c, x) = /(c) - f{x) - (V/(x), c - x) = i(c - x, V 2 /(£)(c - x)) where £ G [c, x]. 

Fix (5 > 0. Since V 2 / is continuous on the compact set C n Bg(x), there exists a = a(x, (5) > such 
that ||V 2 /(f)ll < 2cj for every f G C n £ 5 (x). Then fl2QJ| gives D(c,x) < ct||c- x|| 2 . By (jigj), 

a ||c - x|| 2 > (V/(y) - V/(x), c - x) VcG Cfl £ 5 (x). 
By pU Proposition 1.1.5.(b) on page 25], V/(y) - V/(x) G Ng(x). ■ 

The following example illustrates the geodesies {z\ : < A < 1} given by ([7|). 
Example 3.4 Let x = (xj)i<j<j and y = (yj)i<j<j be two points in R J . 

(i) If/: x ^ i||x|| 2 , then V/ = V/* = Id. We have 

z A = Ay + (1 - A)x, 
for A G [0, 1] . Hence z\ is a component- wise arithmetic mean of x and y. 

(ii) If /: x i— > X)/=i x j ^ n ( x j) — Xj, then 

V/(x) = (lnxi, . . . ,lnx n ), 
J 

/*: x* i-> ^exp(x*), 
j=l 

so that 

V/*(x*) = (expx*, . . . ,expx}). 

We have 

(21) z x = V/*(AV/(y) + (1 - A)V/(x)) 

(22) = V/* (A In i/i + (1 - A) lnxi, . . . , Alnyj + (1 - A) lnxj) 

(23) = (exp(Aln 2/1 + (1 - A) lnxi), . . . ,exp(A In yj + (1 - Xjlnxj)) 

(24) =(^x}- A ,..., 2/ }x^- A ). 

Hence z\ is a component- wise geometric mean of x and y. 

(iii) If /: x h- > X)/=i ex P( x i); then /*: sc* i— > 5^/=i x j ln(x*) — x* so that 

V/(x) = (exp(xi), . . . ,exp(xj)), V/*(x*) = (In a;*, . . . ,lnx}). 

Hence 

z\ = (ln(Aexp(yi) + (1 - A)exp(xi)), . . . ,ln(Aexp(yj) + (1 - A) exp(xj))). 
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Define the symmetrization of D for x, y G U by 

5(x, y) := y) + x) = (V/(x) - V/(y), x - y). 

Proposition 3.5 Given x,y £U and < A < 1, set 

z x := Vf (AV/(y) + (1 - X)Vf(x)). 

Then we have 

(i) D(x, y) = D(x, z x ) + D(z x ,y) + ±^S(x, z x ). 

(ii) S(x,y) = j^jS(y,zx) + jS(z x ,x). 

Proof. Since z x = V/*(AV/(y) + (1 - A)V/(x)), and 

D(x, z x ) = f(x) - f{zx) - <V/(z A ),x - z x ), 

we have 



(25) D(x, zx) = f{x) - f(z x ) - (AV/(y) + (1 - A)V/(x), x - z x ) 

(26) = X[f(x) - f(z x ) - (Vf(y),x - z x )} + (1 - X)[f(x) - f(z x ) - (Vf(x),x - z x )\ 

(27) = X[D(x, y) - D(z x , y)\ - (1 - X)[f(z x ) - f(x) - (Vf(x),z x - x)] 

(28) = X[D(x, y) - D(z x , y)} - (1 - X)D(z x ,x). 

Hence (1 — X)[D(x, z x ) + D(z x , x)] + XD(x, z x ) + XD(z x ,y) = XD(x,y). Dividing both sides by A 
yields 

(29) D(x, y) = D(x, z x ) + D(z x , y) + ^^S(x, z x ), 



which is (i). 

To see (ii), we rewrite 

z x = Vf ((l-A)V/(x) + AV/(y)). 

Applying (i), we get 



(30) D(y, x) = D(y, z x ) + D(z x ,x) + j^^A, v)- 

Adding ([29j) and (|30j), we obtain 

(31) S(x,y) = D(x,y) + D(y,x) 

(32) = [D(z x , y) + D(y, z x )\ + [D(z x , x) + D(x, z x )\ + ^-^S(x, z x ) 

(33) = fl + -A_W Al y) + (l + ?-^jS( x , z x ) 

(34) = -±-S(y,z x ) + \s(z x ,x), 
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which is (ii). ■ 



4 Bregman Nearest Points and Maximal Monotone Operators 

We shall need the following pointwise version of a concept due to Rockafellar and Wets |214 Defi- 
nition 1.16]. 

Definition 4.1 Let g : R^ x — > ]— oo,+oo] and let y € R J . We say that g is level bounded in 
the first variable locally uniformly at y, if for every a£l, there exists 5 > such that 

{x € R^ : g(x, y) < a} is bounded. 

y&B s (y) 

Proposition 4.2 The Bregman distance D is level bounded in the first variable locally uniformly 
at every point in U . 

Proof. Suppose the opposite. Then, for some y £ U, a € R, for every n € {1,2, . . .}, there exist 
y n dom / such that 

\\y n -y\\<~, D(x n ,y n )<a, \\x n \\>n. 
n 

We then have y n — > y, \\x n \\ — * oo and 

(35) D(x n ,y n )<a. 
Now 

(36) D(x n , y n ) = f(x n ) - f{y n ) - (Vf(y n ),x n - y n ) 

(37) = f{x n ) - (V/(y n ), x n ) + \-f{y n ) + (Vf(y n ),y n )}. 

Since / is Legendre, V/ is continuous on U. When n — > oo, we have 

(38) - f(y n ) + (Vf(y n ),y n ) -f(y) + (V/(y), y) , 
and 



(39) /(,•„ ) - (Vf(y n ),x n ) = \\x n \\ ( ^4 ~ (V/(l/, N ' 



n 



\-En\\ \\^n\ 



MO) >IWI(^-||V/(l/n)||) x. 



.7,., 



since ||V/(y„)|| -> ||V/(y)|| and lim/(x n )/||x n || = +oo. p8l) - (i4T)]) and ([37]) altogether show that 
D(x n ,y n ) — > oo, but this contradicts (j35l) . ■ 



The following result will be very useful later. 
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Theorem 4.3 The following hold. 

(i) For each y G U , the set Pc(y) is nonempty and compact. Moreover, Dq is continuous on U . 

(ii) If x n G Pc{Vn) and y n — ► y G U, then the sequence (x n )^ =1 is bounded, and all its cluster 
points lie in Pc{v)- 

(iii) Let y G U and Pc(y) = {%}■ If %n G PcilJn) and y n — > y, then x n — > x; consequently, Pc is 
continuous at y. 

Proof. Fix y G U and 5 > such that Bg(y) C J7. Consider the proper lower semicontinuous 
function g : W J x W J — > ]— cx>, +oo] defined by 

(x,y) i ^ -D(x,y) + t c (x) + t Bs (y)(y)- 

Observe that domg = C x B$(y). For every y € K"^ and a G R, we have 



(41) {x G M : g(x,y) < a} 

We now show that 



Cn{iGl J : D(x,y) < a}, ifyG5 5 (y); 
0, otherwise. 



(42) g is level bounded in the first variable locally uniformly at every point in 

To this end, fix z G R J and a£i 
Case 1: z £ Bs{y). 

Let e > be so small that B s (y) n B e (z) = 0. Then flU} yields 

|J {x G R J : 5f(x, z) < a} = 0, 

which is certainly bounded. 
Case 2: z G B s (y). 

Since Bg(y) C U, we have z G U. Proposition 14.21 guarantees the existence of e > such that 

[J {x G M. J : D(x,z) < a} is bounded. 

z£B e (z) 



In view of (|4"T1) , the set 



|J C(l{x eR J : D(x,z) <a} = [j {x G M J : g{x, z) < a} 

z&B e (z)C\B 6 (y) z€B e (z) 



is bounded as well. 

Altogether, we have verified (|42p . 
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Define a function m at y G IR* 7 by 
m(y) := M g(x,y) 

Then m = Dc + ^(g) and 



inf x . eC D(x,y)= D c (y), if y G -6,5(2/); 
+00, otherwise. 



. j argmin^gc- L>(x,y) = P c (y), if y & B s (y); 

argmmg(x,y) = < 
zeRj I 0, otherwise. 

Now (H2J) and [2TJ Theorem 1.17(a)] implies that if y G B$(y), then Pc{y) is nonempty and 
compact. In particular, Pc{y) 7^ and compact. Take x G Pc(y). As 

5 (s, •) = •) + i Bi(s) 

is continuous at y, by [2H Theorem 1.17(c)], the function m is continuous at y. Hence Dc is 
continuous at y. Since y G U is arbitrary, this proves (i). Next, |2H Theorem 1.17(b)] gives (ii) 
since Dq is continuous on U. 

Finally, (hi) is an immediate consequence of (ii). ■ 

Our next result states that Pc V/* is a monotone operator. This is also related to [3j Propo- 
sition 3.32.(ii)(c)], which establish a stronger property when C is convex. 

Proposition 4.4 For every x,y in U , 

(43) (P c (y) - P c (x),Vf(y) - V/(x)) > 0; 

consequently, Pc ° V/* is monotone. 

Proof. Since 

D(%(x),y) > D(P c (y),y), D(%(y),x) > D(%(x),x), 

we use to get 

f(%(x)) - f(P c (y)) - (V/(y), %(x) - %(y)) > 0, 
f(%(y)) ~ f(Pc{x)) - (Vf(x), %{y) - %{x)) > 0. 
Adding these inequalities yields 

(V/(y), %{y) - %(x)) - (Vf(x), %{y) - %{x) > 0, 

i.e., (|43p . The monotonicity now follows from Fact 12.21 and our assumption that dom/* = M. J . ■ 

Definition 4.5 The set C is Chebyshev with respect to the left Bregman distance, or simply 
D -Chebyshev, if for every x G U, Pc( x ) is nonempty and a singleton. 
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For some instances of /, it is known that if C is convex, then it is D -Chebyshev (see, e.g., [21 
Theorem 3.14]) and Pc is continuous (see, e.g., [H Proposition 3.10(i)]). The next result is a 
refinement. 

Proposition 4.6 Suppose that C is D -Chebyshev. Then Pc ■ U — > C is continuous. Hence 
Pc ° V/* is continuous and maximal monotone. 

Proof. While the continuity of Pc follows from Theorem |4.3f iii). Proposition 14.41 shows that 
Pc V/* is monotone. Since Pc is continuous on U and V/* : M. J — ^ f7 is continuous, we conclude 
that Pc ° V/* is continuous on WL J . Altogether, since Pc ° V/* is single- valued, it is maximal 
monotone on M. J by |21l Example 12.7]. ■ 

Rockafellar's well-known result on the virtual convexity of the range of a maximal monotone 
operator allows us to show that D -Chebyshev sets are convex. Our proof extends a Hilbert space 
technique due to Berens and Westphal [5]. 

Theorem 4.7 ( D-Chebyshev sets are convex) Suppose that C is D -Chebyshev. Then C is 
convex. 

Proof. By Proposition 14.61 Pc ° V/* is a maximal monotone operator on 1H J . Using |2H Theorem 
12.41] (or [22J Theorem 19.2]), cl[rantp c oV/*] is convex. Since ranV/* = U and C C U, it follows 
that 

C D ran (% o V/*) = Pc(Vf*(R J )) = %(U) D %{C) = C, 
from which cl[ran Pc ° V/*] = cl C = C. Hence C is convex. ■ 

Corollary 4.8 The set C is D -Chebyshev if and only if it is convex. 



5 Subdifferentiabilities of Bregman Distances 

Let us show that Dq is locally Lipschitz on U. 

Proposition 5.1 Suppose f is twice continuously differentiable on U . Then the left Bregman 
distance function satisfies 

(44) D C = f* o V/ - (/ + ccT o V/ = [/* - (/ + i c )*] o V/, 

and it is locally Lipschitz on U. 
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Proof. The Mean Value Theorem and the continuity of V 2 / on U imply that V/ is locally Lipschitz 
on U. For y G U, 



(45) D c (y) = inf [/(c) - f(y) - (V/(y), c - y>] 

cec 

(46) = inf [(/ + l c ){c) - <V/(y), c) + /* (V/(y))] 

c 

(47) = /*(V/(y)) - sup[(V/(y), c) - (/ + t c ){c)] 

c 

(48) =r(V/(y))-(/ + tc)*(V/(i/)). 



Note that f + tc > f, (f + to)* < /*> so dom/* C dom(/ + tc)*. Being convex functions, both 
(/ + tc 1 )* and /* are locally Lipschitz on interior of their respective domains, in particular on 
intdom/* = M^. Since V/ : U — > is locally Lipschitz, we conclude that Dq is locally Lipschitz 
on U. U 

For a function 5 that is finite and locally Lipschitz at a point y, we define the Dini subderivative 
and Clarke subderivative of 5 at y in the direction w, denoted respectively by dg(y)(w) and 
dg(y)(w), via 

d0(y)(«O :=liminf^ + tw) - g(?/) , 

3 / w \ v ff(a; + tto) -g(x) 
dg{y){w) :=hmsup , 

x^y t 

ij.0 

and the corresponding Dini subdifferential and Clarke subdifferential via 

%(y) := {y* G M J : (y*» < dy(y)H, V™ G R J }, 

dg(y) ■= iv* e M J : (!/*,«;) < d#(y)H, Vw G M J }. 
Furthermore, the limiting subdifferential is defined by 

dhgiy) ■■= limsupcMa;), 

x^y 

see [2TJ Definition 8.3]. We say that g is Clarke regular at y if dg(y)(w) = dg(y)(w) for ev- 
ery u; G R" 7 , or equivalently dg(y) = dg{y). For further properties of these subdifferentials and 
sub derivatives, see (T3 j [T9 t I2T]. 

We now study the subdifferentiability of Dc in terms of Pc- 

Proposition 5.2 Suppose f is twice continuously differentiable on U . Then the function — Dc is 
Dini subdifferentiable on U ; more precisely, if y G U, then 

V 2 f(y)[Pc(y)-y}cd(-D c )(y), 

and thus 

(49) V 2 /(y)[conv %(y) - y] C d(-*D c )(y). 
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Proof. Fix y £ U. By Theorem 14.3( 1). Pc(y) 7^ 0- Let x € Pc(y)- As <9 is convex- valued, it 
suffices to show that 

(50) v 2 /(y)(x-y) € d(-D c ){y). 

To this end, let t > and u> G M J . Since for sufficiently small t, y + tw £ U, 

(51) - A?(y + to) = sup ( - /(c) + f(y + to) + (V/(y + to), c - (y + to))) 

cGC* 

(52) > -f{x) + /(y + to) + (V/(y + to), x - (y + to)) 
and 

(53) %(y) = f{x) - f{y) - (Vf(y),x - y), 
we have 

-%{y + tw) + %(y) > f{y + to) - f{y) + (Vf(y + tw) - Vf(y),x - y) + (V/(y + tw), -tw). 
Dividing both sides by t and taking the limit inferior as t J, 0, we have 

(54) d(- < D c )(y)H > (Vf(y),w) + <V 2 /(yK x - y) - (Vf(y),w) 

(55) = (V 2 f(y)(x-y),w), 

which gives ([50]) . ■ 

Lemma 5.3 Suppose that f is twice continuously differ 'entiable on U, let y E U , and suppose that 
Pc{y) is a singleton. Then Dq is Dini subdifferentiable at y and 

(56) V 2 f(y)(y-Pc(y))edD c (y). 

Proof. Suppose that Pc(y) = {x}, and fix w € M J . Let (t n ) be a positive sequence such that 
(y + t n w) lies in U, t n { 0, and 

j<tt i \i \ v D c {y + t n w) - D c {y) 
d D c {y)[w) = hm 

Select x n € Pc(y + t n w), which is possible by Theorem I4.3f i). We have 

%(y + t n w)-rj c (y) 

= D(x n ,y + t n w) - D(x n , y) + D(x n ,y) - D(x, y) 
> D(x n ,y + t n w) - D(x n ,y) 

= f(x-n) - f{y + t n w) - (Vf(y + t n w),x n -{y + t n w)) - [f(x n ) - f(y) - (Vf(y),x n - y)] 
= ~ (f{y + t n w) - f(y)) - (V/(y + t n w) - Vf{y),x n - y) + t n (X7f{y + t n w),w). 
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Dividing by t n , we get 



D c (y + t n w) - D c {y) 
(57) > 



_ f(y + t n w)-f(y) _ (Vf(y + t n w)-Vf(y),x n -y) + (y f(y + ^ ^ 

By Theorem 14. 3f iii) . x n — > x. Taking limits in (|57p yields 

d %(y)(w) > -(V 2 f(y)w,x-y) = {V 2 f(y)(y-x),w). 

Since this holds for every w G M J , we conclude that V 2 /(y)(y — x) G d Dc{y)- ■ 

Lemma 15.31 allows us to generalize |21| Example 8.53] from the Euclidean distance to the left 
Bregman distance. It delineates the differences between the Dini subdifferential, limiting subdif- 
ferential and Clarke subdifferential. 

Theorem 5.4 Suppose that f is twice continuously differentiable on U and that for every u € U , 
V 2 f(u) is positive definite. Set g = Dc, and let y £ U and w G M J . Then the following hold. 

(i) The Dini subderivative is 

(58) dg(y)(w)= min {V 2 f(y)(y-x),w), 



so that the Dini subdifferential of g is 
(59) dg{y) -- 



V^ 2 f(y)[y - Pc(y)]} if Pc{y) is a singleton; 
0, otherwise. 



The limiting subdifferential is 

(60) d L g(y)=V 2 f(y)[y-%(y)}. 
The Clarke subderivative is 

(61) dg{y)(w)= max (V 2 f(y)(y - x),w), 

x€P c (y) 

from which we get the Clarke subdifferential 

(62) dg(y) = V 2 f(y)[y - conv %{y)}. 

Hence — Dq is Clarke regular on U. 
(ii) Ify€C, then g is strictly differentiable with derivative 0. 
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Proof. By Theorem I4.3f i). Pc(y) 7^ 0- Fix x € Pc(y) an d t > sufficiently small so that 
y + tw £ U. In view of -Dc"(y + to) < D(x, y + to) and Dc{y) = D(x, y), we have 

dg(y)(w) = hminf WtM^^) < Hminf ^ 2/ + to) - D(x, y) 



lim inf 



tj.0 t 40 t 

/(*) - /(y + to) - (V/(y + to), x - (y + to)) - [/(x) - /(y) - (V/(y), x - y)\ 



no t 

_ Hm inf ~[/(y + to) - /(y)] - (V/(y + to) - V/(y), x - y) + t(V/(y + tw),w) 

tio t 

r . f /(y + to)-/(y) (V/(y + to)-V/(y),x-y) 

= limmi \- {v j{y + tw),w) 

t j,o £ £ 

= - (Vf(y),w) - (V 2 f(y)w, x - y) + (Vf(y),w) 

= (V 2 f(y)(y-x),w). 

Since this holds for every x G Pc(u)i it follows from Theorem I4.3f i) that 

dg(y)(w) < min (V 2 f(y)(y-x),w). 

x£P c (y) 

To get the opposite inequality, we consider a positive sequence (t n ) such that i n J, 0, (y + i n w) 
lies in J7, and 

, , w n ^(y + tn^-^y) 
o-g{y){w) = hm . 

Select x n G -Pc(y + *n^)) which is possible by Theorem I4.3H ). Then 

Dc(y + t n w) = D(x n ,y + t n w) 

(63) = f(x n ) - f(y + t n w) - (V/(y + t n w),x n - (y + i n io)) 

and 

(64) £ C (y) < D(x n , y) < f{x n ) - f(y) - (V/(y), x n -y). 

By Theorem 14. 3f ii). and after taking a subsequence if necessary, we assume that x n — > x G Pc(y)- 
We estimate 

(65) A?(y + *nw)~ 
[/(y + *„u;) - /(y)] - (V/(y + t n w) - V/(y), x n - y) + (V/(y + t n w), t n w) 



> 



in 



[f(y + t nW ) - f(y)] (V/(y + ^) - V/(y),x w -y) + (y f(y + ^ w>> 



Taking limits, we obtain 

dg(y)(w) > -(V 2 f(y)w,x - y) = (V 2 /(y)(y - x), w) > min (V 2 f(y)(y-x),w). 

xeP c (y) 



17 



Therefore, ([35]) is correct. 

For y* G R J , y* G dg{y) if and only if 

(y*,w) < (V 2 f(y)(y-x),w) Vx G %(y), w G M J . 

This holds if and only if y* = V 2 /(y)(y — x), Vx G Pc(y)', since V 2 f(y) is invertible, we deduce 
that x = y— (V 2 f(y))~ 1 y* , so that Pc{y) is unique. Therefore, if Pc{y) is not unique, then <9g(y) 
has to be empty. Hence ([39]) holds. 

For every z G M^, we have 

<%(z)cV 2 /(z)(z- %{z)). 

The upper semicontinuity of Pq (see Theorem I4.3f ii)) implies through digiy) = limsup^^ dg(z) 
that 

(66) d L g(y)cV 2 f(y)(y-P c (y)). 

Equality actually has to hold. Indeed, for x G Pc{y) an d < A < 1, the point 

z x := V/*(AV/(y) + (1 - A)V/(x)), 

has Pc(z\) = {x} by Proposition I3.2l fii). Lemma 15.31 shows that 

V 2 f{z x ){z x -x) eBg(z x ), 

where V 2 f(z x )(z x — x) — ► V 2 /(y)(y — x) as A — > 1, since V 2 f is continuous. Thus V 2 /(y)(y — G 
dig{y) and therefore 

(67) V 2 /(y)(y- %{y)) Cd L g(y). 
Hence ([66]) and ([67]) together give ([60]) . 

Since g is locally Lipschitz around y G ?7 by Proposition 15.11 the singular sub differential of g at 
y is 0, so that its polar cone is H J . Then for every w G M" 7 , using |2H Exercise 8.23] we have 

dg(y){w) = sup{(y*,w) : y* G d L g(y)}; 
thus, ([6T]) follows from ([60]) . Now ([6T]) is the same as 

dg(y)(w) = max(V 2 /(y)(y - conv Pc{y)),w). 
As conv Pc(y) is compact, we obtain (|62j) . Or directly apply [211 Theorem 8.49] and (|60p. 
The Clarke regularity of — Dc follows from combining (|49p and (|62j) . Indeed, 

V 2 /(y)[conv P c (y) - y] C d{-%)(y) C 0(- A?)(y) = V 2 /(y)[conv P c (y) - y], 
so that £(-A7)(y) = 9(- A?)(y). 

(ii): When y G C, Pc{y) = {y}- By ([60]) . dig{y) = {0}, and this implies that y is strictly 
differentiable at y by [2H Theorem 9.18(b)]. ■ 
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Corollary 5.5 Suppose that f is twice continuously differentiable and that V 2 / (y) is positive def- 
inite, for every y G U. Then for y G U , the following are equivalent: 



(i) Do is Dini subdifferentiable at y; 

(ii) Dc is differentiable at y; 

(iii) Dq is strictly differentiable at y; 

(iv) Dq is Clarke regular at y; 

(v) Pc{y) is a singleton. 

If these hold, we have V Dc(y) = V 2 /(y)[y — Pc(y)]- 

Proof. (i)=>(ii): By Proposition 15.21 both — Dc and Dc are Dini subdifferentiable. Thus Dc is 
differentiable at y (see Exercise 3.4.14 on page 143]), and 

dD c (y) = -d(-D c )(y) = {VD c (y)}. 

(ii)=>(i) is clear. (ii)<^(iii)44>(iv): This is a consequence of [231 Theorem 3.4]. (ii)-^(v): If Dc is 
differentiable at y, then ([59]) implies that Pc{v) is a singleton. Conversely, if Pc{v) is a single- 
ton, then (|62p . Proposition 15.11 and [10, Proposition 2.2.4] show that Dc is strictly differentiable 
and hence differentiable at y. Finally, the gradient formula V Dc(y) = V 2 /(y)[y — Pc(y)] is a 
consequence of Proposition 15.21 or Lemma 15.31 ■ 

Corollary 5.6 Suppose that f is twice continuously differentiable on U and that V 2 /(y) is positive 
definite for every y G U. Then Pc is almost everywhere and generically single-valued on U. 



Proof. By Proposition 15.11 Dc is locally Lipschitz on U. Apply Rademacher's Theorem The- 
orem 9.1.2] or [II] Corollary 3.4.19] to obtain that Dc is differentiable almost everywhere on U. 
Moreover, since — Dc is Clarke regular on U by Theorem 15.41 we use Theorem 10] to con- 
clude that — Dc is differentiable generically on U, and so is Dc- Hence the result follows from 
Corollary 15.51 ■ 



6 Characterizations of Chebyshev Sets 

Definition 6.1 For g : R J — > ]— oo,+oo] (not necessarily convex), we let 

dg(x) := {s£R J : g(y) > g(x) + (s,y - x) \/y € R J } if x G dom#; 
and dg(x) = otherwise; and the Fenchel conjugate of g is defined by 

s ► 9*(s) '■= sup{(s, x) — g(x) : x G M J }. 
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According to [151 Proposition 1.4.3], 

(68) s£dg(x) x£dg*(s), 

which becomes "■<=>•" if g G T. In order to study -D-Chebyshev sets, we need two preparatory results 
concerning subdifferentiabilities of / + ic and (/ + i>c)*- Lemmas 16.21 16.31 and Theorem 16.61 below 
generalize respectively, and are inspired by \15\ Propositions 3.2.1, 3.2.2 and Theorem 3.2.3]. 

Lemma 6.2 Let x£l J . Then 

d(f + i C )(x) = {s£R J :x€ %(yt(s))} = (% o Vrr\x), 

and consequently d(f + ic) = ( Pc ° V/*) 1 . 

Proof. The statement is clear if x ^ C, so assume x G C. By [15, Theorem 1.4.1], 

(69) sed(f + L C )(x) & (f + icT(s) + (f + ic)(x) = (s,x). 
Proposition 15.11 shows that 

(/ + =f*-%o V/* on R J . 
Combining with (f69j) and since x G C, we get 

s£d(f + L C )(x) ^ f*(s)-(D c oVP)(s) + f(x) = {s,x); 

equivalently, 



(70) Dc(Vf*(s)) = f(x) + f*( S ) - (s,x) 

(71) = f{x) + f ((V/ o V/*)(s)) - (V/ o Vf*(s), x) 

(72) = f{x) - fiVfis)) - <V/(Vf is)), x - V/* is)) 

(73) =%Vf( S )), 



i.e., x G Pc(V/*(a)). ■ 

The following result, which establishes the link between c?(/ + ic)* and Pc ° V/*, is the corner- 
stone for the convexity characterization of D-Chebyshev sets. 

Lemma 6.3 Let s G R J . Then 

dif + to)*(s) = conv[P c (V/*( S ))] = conv[P c o V/*(s)]. 

Consequently, Pc ° V/* is monotone on R J . 
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Proof. Since / is 1-coercive and C is closed, the function / + lq is 1-coercive and lower semi- 
continuous. We have that conv(/ + lc) is lower semicontinuous by [151 Proposition 1.5.4], and 
dom(/ + t c )* = R J by p23 Proposition 1.3.8]. Now 

x e d(f + i C )* (s) x £ d[conv(f + l c )]*(s) <^ s G <9[conv(/ + lc)](x), 

in which the first equivalences follows from [15} Corollary 1.3.6] and the second equivalence uses 
the lower semicontinuity of conv(/ + lc). Using [T51 Theorem 1.5.6], s € (9[conv(/ + lc)](x) if and 
only if there exist x\, . . . ,Xk € M J , a\, . . . , > such that 

k k k 

(74) ^^0^ = 1, x = ' s ^a.jXj, and s£ O d{f + Lc){xj). 

j=i j=i j=i 

But s £ d(f + ic)(xj) is equivalent to 

x 3 e %(Vf*(s)), 

by Lemma [631 Hence (fTij) gives 9(/ + tc , )*( s ) = conv Pc(Vf*(s)). Finally, as a selection of 
d(f + Lc)* i which is maximal monotone, the operator Pc ° V/* is monotone. ■ 

Remark 6.4 Let y e R J = dom/*. Then (/ + L C )*(y) = f*(y) ~ mf, G c[/(x) + f*(y) - (y,x)]. 
Since 

fix) + f*(y) - (y, x) = f(x) + r(V/(V/*(y))) - <V/(V/* (y)), x) = D(x, V/* (y)), 

we have (/ + <, c )*(y) = /*(y) - £ c (V/*(y)). Hence 

(/ + ic)* = /*- %oVf*; 
see also Proposition 15.11 If / = ^|| • || 2 , then 

is the so-called Asplund function, where dc{y) '■= inf{||y — x|| : x € C}, Vy G K" 7 . In this case, 
Lemma [6731 is classical; see [161 pages 262-264] or [17] . 

We also need the following result from [23] . 

Proposition 6.5 (Soloviov) Let g : R J — > ]— oo,+oo] be lower semicontinuous, and assume that 
g* is essentially smooth. Then g is convex. 

Now we are ready for the main result of this section. 
Theorem 6.6 (Characterizations of D-Chebyshev sets) The following are equivalent: 
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(i) C is convex; 

(ii) C is D -Chebyshev, i.e., Pc is single-valued on U; 

(iii) Pc is continuous on U ; 

(iv) D(j o V/* is differentiable on M. J ; 

(v) f + tc is convex. 

When these equivalent conditions hold, we have 
(75) V( % o V/*) = V/* - % o Vf 

consequently, Dc ° V/* is continuously differentiable. 

If, in addition, f is twice continuously differentiable on U and V 2 /(y) is positive definite Vy E U, 



on 



then (i) - (v) are equivalent to 



(vi) Dq is differentiable on U. 
In this case, we have 

(76) V%(y) = V 2 f{y)[y- %{y)\ Vy G U; 

consequently, Dq is continuously differentiable. 



Proof. 
To see 



(i) 


— \ 


(ii) 


is 


(iii) 




(iv) 
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follows from Theorem I4.3lfiii). 



we use Remark [67 

A? o V/* = /*-(/ + l c )*. 

Since Pq is continuous on U and V/* : M. J — > f7, 9(/ + ic 1 )* is single-valued on by Lemma HT 
Thus, (/ + lc)* is differentiable on M J . Altogether, D c o V/* = /* — (/ + is differentiable on 

When / + is convex, since C C U we have that dom(/ + ip) = C is convex, and this shows 



(v) => (i) . We now prove (iv) (v) and assume (iv) Remark 16.41 shows 



(77) u+icf = r- D C ovr, 

which implies that 

(78) (/ + lc)* is differentiable on R J . 

Since / + lq is lower semicontinuous, it follows from Proposition 16.51 that / + lq is convex. 

When equivalent conditions (i)-(v) hold, (|75p follows from Lemma 16.31 and (|77p . Since V/* is 



continuous and Pc is continuous by (iii) , we obtain that Dc ° V/* is continuously differentiable. 
When V 2 /(y) is positive definite My E U, (ii)^(vi) by Corollary 15.51 Finally, (fT6|) follows from 
Theorem I5.4[ i.e., (j59j) . This finishes the proof. ■ 
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7 Right Bregman Projections 



In this section, it will be convenient to write Df for the Bregman distance associated with / (see 
([2])). Correspondingly, we write Pq, Pq for the corresponding left and right projection operators. 
While Df is convex in its first argument, it is not necessarily so in its second argument. The 
properties of Pq can be studied by using P^,q, . 

Proposition 7.1 Let f £ T be Legendre and C C intdom/. Then for the right Bregman nearest 
point projection, we have 

(79) ^=VfoP v j ; (c) oV/; 



or equivalently, 

1 V/(C) ~ v J u 1 c 



(80) P4*=VfopfoVf*. 



Proof. By [2j Theorem 3.7(v)] (applied to /* rather than /), 

D f *(x*,y*) = D f (Vr(y*),Vf*(x*)) Vx*,y* £ intdom/*. 
For every y* € intdom/*, we thus have 

(81) *Pf f(C )(y*) = ar S min D r (x*,y*) = argmin D f (Vf*(y*),Vf*(x*)) 

x*evf(c) x*ev/(C) 

(82) = V/(?> „™(Vr(if ))) = V/(P c / (V/*(y*))) 



v/*(v/(c))V v j vy - v j ^ c 

?/ 

c 

which gives (fSTjj) . Finally, we see that (fT9"j) is equivalent to (|8"U|) by using Fact [X 



(83) =(V/oP^oV/*)(y*) 



Lemma 7.2 Lei / £ T k Legendre, let C C M" 7 6e suc/i i/iai clC C intdom /, and assume that 
for every y € intdom f, Pq{u) ^ 0- Then C is closed. 

Proof. Assume that (c n )^L 1 is a sequence in C, and c n — > y. We need to show that y £ C. By 
assumption y £ cl C, and y £ U. If y ^ C, then 

(84) D f {cy) = f(c)-f(y)-<yf(y),c-y)>0, Vc £ C, 

by, e.g., [2J Theorem 3.7.(iv)]. On the other hand, as / is continuous on J7, 

< ^(y) < D f (c n , y) = f{c n ) - f(y) - (V/(y), c n - y) -> 0. 
Thus, D^{y) = 0. Using (f84l) . we see that this contradicts the assumption that Pq(jj) ^ 0- H 

Theorem 7.3 Lei / £ I k Legendre, with full domain W J , and let C C 6e closed with 
cl(V/(C)) C intdom/*. Assume that P^{y) is a singleton for every y £ R^. T/ien V/(C) is 
convex. 
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Proof. We have /* is Legendre and /* is 1-coercive. By (|80p. P^, c Jy) is single- valued for every 
y € intdom/*. As cl(V/(C)) C intdom/*, Lemma 17. 21 savs that the set V/(C) is closed. Hence 
we apply Theorem 16,61 to /* and V/(C), and we obtain that V/(C) is convex. ■ 

Corollary 7.4 Lei / and C satisfy A1-A3, assume that f has full domain, and that Pq{v) is a 
singleton for every y 6 R J . T/ien V/(C) is convex. 

The following example shows that even if P<l(y) is a singleton for every y E intdom/, the set 
C may fail to be convex. Thus, Theorem 16.61 fails for the right Bregman projection Pq. Note that 
Theorem 17.31 allows us to conclude that V/(C) is convex rather than C. 

Example 7.5 Consider the Legendre function / : M 2 — ► M given by 

/(x,y) :=e* + e^ V(x,y)eM 2 , 

and its Fenchel conjugate 

2 fxlnx -x + ylny-y, ifx>0,y>0; 

/ : R -> ]-oo,+oo] : (x,y) i-> < 

I +oo, otherwise. 

Define a compact convex set 

C := [(0, 0), (1, 2)] = {(A, 2A) : < A < 1}. 
As V/(x,y) = (e x ,e y ) for every (x,y) £ M 2 , we see that 

Vf(C) = {(e\e 2X ) : < A < 1} 
is compact but clearly not convex. 

(i) In view of Theorem 17.31 and the lack of convexity of V/(C), there must exist (x, y) G M. 2 such 



that P/(x,y) is multi-valued. 



(ii) Since P^{x,y) is a singleton for every (x,y) E M 2 , and since P^.,^ = V/ o o V/* by 

Proposition [7T] (applied to /* and V/(C)), we deduce that P^, c ~. is single- valued on intdom/* = 
{(x,y) : x > 0, y > 0}. Therefore, the analogue of Theorem 16.61 for the right Bregman projection 
Theorem 16.61 fails even though /* is Legendre and 1-coercive. 
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